feat: Make supported hadoop filesystem schemes configurable #2272

wForget · 2025-09-01T11:59:31Z

Which issue does this PR close?

Closes #2271.

Rationale for this change

Currently we prefer to use jvm-based libhdfs to implement native hdfs reader, which means we can support more hadoop file systems. But currently we hardcode to support only hdfs scheme, I want to make the supported hadoop file system schemes configurable.

What changes are included in this PR?

Make supported hadoop filesystem schemes configurable

How are these changes tested?

After patch #2244, the newly added test cases were successfully run

codecov-commenter · 2025-09-01T12:22:26Z

Codecov Report

❌ Patch coverage is 37.50000% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.90%. Comparing base (f09f8af) to head (d442a93).
⚠️ Report is 455 commits behind head on main.

Files with missing lines	Patch %	Lines
...la/org/apache/comet/objectstore/NativeConfig.scala	0.00%	5 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2272      +/-   ##
============================================
+ Coverage     56.12%   57.90%   +1.77%     
- Complexity      976     1291     +315     
============================================
  Files           119      146      +27     
  Lines         11743    13376    +1633     
  Branches       2251     2374     +123     
============================================
+ Hits           6591     7745    +1154     
- Misses         4012     4374     +362     
- Partials       1140     1257     +117

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

comphead · 2025-09-02T04:02:10Z

@parthchandra cc

parthchandra · 2025-09-02T16:30:44Z

native/core/src/parquet/parquet_support.rs

-    }
-    .map_err(|e| ExecutionError::GeneralError(e.to_string()))?;
+    let (object_store, object_store_path): (Box<dyn ObjectStore>, Path) =
+        if is_hdfs_scheme(&url, object_store_configs) {


There is a little gotcha here when the scheme is s3a. In s3a's case, we replace s3a with s3 so that we can use the native object store implementation (

datafusion-comet/native/core/src/parquet/parquet_support.rs

Line 373 in 60776f2

if scheme == "s3a" {

). If the user has s3a in the list of hdfs urls because they want to use the hadoop-aws implementation, then they will still end up with the native implementation.

Thanks, I made some adjustments, could you take another look?

parthchandra · 2025-09-02T16:35:56Z

common/src/main/scala/org/apache/comet/objectstore/NativeConfig.scala

@@ -40,6 +41,8 @@ object NativeConfig {
    // Azure Data Lake Storage Gen2 secure configurations (can use both prefixes)
    "abfss" -> Seq("fs.abfss.", "fs.abfs."))

+  val COMET_LIBHDFS_SCHEMES_KEY = "fs.comet.libhdfs.schemes"


Should we make this a Comet conf (i.e add it in CometConf so it is automatically documented)?

Thanks, moved to CometConf.

parthchandra · 2025-09-03T23:26:53Z

common/src/main/scala/org/apache/comet/CometConf.scala

+    conf(s"spark.hadoop.$COMET_LIBHDFS_SCHEMES_KEY")
+      .doc(
+        "Defines filesystem schemes (e.g., hdfs, webhdfs) that the native side accesses " +
+          "via libhdfs, separated by commas.")


nit: perhaps we can mention that this configuration is valid only if comet has been built with the hdfs feature flag enabled.

Thanks, added this description

feat: Make supported hadoop filesystem schemes configurable

13cb948

wForget marked this pull request as ready for review September 2, 2025 02:15

parthchandra reviewed Sep 2, 2025

View reviewed changes

wForget mentioned this pull request Sep 3, 2025

Integrate Apache OpenDAL to support more file serivces #2243

Closed

address comments

7dc4ea4

parthchandra reviewed Sep 3, 2025

View reviewed changes

improve desc

d442a93

wForget mentioned this pull request Sep 4, 2025

Enable unit test of hdfs feature in gha #2298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Make supported hadoop filesystem schemes configurable #2272

feat: Make supported hadoop filesystem schemes configurable #2272

wForget commented Sep 1, 2025

Uh oh!

codecov-commenter commented Sep 1, 2025 •

edited

Loading

Uh oh!

comphead commented Sep 2, 2025

Uh oh!

parthchandra Sep 2, 2025

Uh oh!

wForget Sep 3, 2025

Uh oh!

parthchandra Sep 2, 2025

Uh oh!

wForget Sep 3, 2025

Uh oh!

parthchandra Sep 3, 2025

Uh oh!

wForget Sep 4, 2025

Uh oh!

Uh oh!

feat: Make supported hadoop filesystem schemes configurable #2272

Are you sure you want to change the base?

feat: Make supported hadoop filesystem schemes configurable #2272

Conversation

wForget commented Sep 1, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

comphead commented Sep 2, 2025

Uh oh!

parthchandra Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

wForget Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

wForget Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

wForget Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented Sep 1, 2025 •

edited

Loading